|
|||
Home | TRIM3 Navigator | Documentation | |||
Statistical Match Procedure Used in the 2010 and Later BaselinesThe statistical match procedure used beginning with the 2010 baseline is an unconstrained nearest neighbor match similar to that used in 2007-2009, but with changes to the treatment of high-income units. Prior to matching, the CPS and PUF are divided into mutually exclusive groups that only allow matching within each respective group. The groups are defined by the following "blocking variables":
Several additional constraints are imposed on the matching algorithm that have the effect of reducing the number of PUF records that are potential matches to a particular TRIM3 record. These constraints relate to:
The 2010 baseline changed the 2005-2009 practice of using the PUF to restore variation to top-coded CPS incomes. In previous years, the Census Bureau top-coded income amounts exceeding certain thresholds in order to preserve confidentiality, and replaced top-coded amounts with averages calculated for all top-coded individuals. However, in the 2011 CPS, the Census Bureau adopted a new procedure of "rank proximity swapping," in which individuals with income amounts above a threshold value have their amounts "swapped" with the value of another high-income person within a bounded interval. This ensures that no high-income person record contains the exact income data of that person, while preserving the distribution of values above the threshold. Because this change to the data removes the need to restore variation, we altered our statistical match procedure to no longer import income values from the PUF; CPS income variables are now used throughout the baseline. Although it is no longer necessary to substitute top-coded CPS income variables with income variables from the PUF, we continue to create five clones of high income households in order to allow greater variation in capital gains and deductions obtained through the statistical match with the PUF. Once the match procedure has identified the set of PUF records that can be matched to a given CPS tax unit, a PUF record is selected using a "minimum distance" function. This procedure varies between units that are "high income" (that is, with one or more income amounts above the threshold for rank proximity swapping) and lower income units. If the tax unit is not treated as a "high income" unit for the purpose of the match, then the distance function is computed based on AGI. Capital gains and IRA and Keogh contributions are obtained from the PUF record being considered for the match. The capital gains are added (and IRA and Keogh contributions are subtracted) from the preliminary AGI calculated by TRIM3. The resulting AGI is compared to the AGI of the available PUF records and the record with the least difference in AGI is selected. For "high income" tax units, the minimum distance function is computed by examining the difference between the CPS tax unit and the PUF record for each of ten income items reported on both the CPS and the PUF (wages, business income, farm income, interest, pensions, dividends/estates/trusts, rents/royalties, total social security benefits, unemployment compensation, and alimony received). The PUF record with the least absolute difference across these income items is selected as a match. Once a PUF record has been selected, variables from that record are assigned to the CPS tax unit. The weight of the PUF record is then reduced by the weight of the CPS tax unit. Once the weight for a PUF record has been reduced to zero, it cannot be matched to additional CPS tax units. Because the variables obtained through the statistical match for an individual tax unit are obtained from a single PUF record, we are limited in our ability to align any specific variable to target. However, we do make some adjustments. We adjust the capital gains and deduction dollar amounts to reflect the change in average dollar amounts between the year of the PUF data and the tax year being simulated, and we make minor adjustments to increase or decrease the likelihood of selecting a PUF record based on whether the record has income or deduction values from particular sources (such as capital gains). We also perform some minimal alignment by adjusting the dollar amounts used to disallow matches to PUF records with very large income or deduction amounts. The 2010 baseline used the 2007 PUF, and the 2011-2013 baselines used the 2008 PUF. |
|||